A minimal-resource transliteration framework for vietnamese
نویسندگان
چکیده
Transliteration converts words in a source language (e.g., English) into phonetically equivalent words in a target language (e.g., Vietnamese). Transliteration is therefore used to handle out-of-vocabulary (OOV) words adopted from foreign languages in automatic speech recognition and keyword search systems. While statistical transliteration approaches have been widely adopted, they may not always be suitable for underresourced languages, where training data is scarce. In this work, we present a rule-based Vietnamese transliteration framework suitable for spoken language applications with minimal linguistic resources. We show that the proposed system outperforms statistical baselines by up to 81.70% relative when there is limited training examples (94 word pairs). In addition, we investigate the trade-off between training corpus size and transliteration performance of different methods on two distinct corpora. We also show that the proposed model outperforms statistical baselines up to 36.76% relative in keyword search tasks.
منابع مشابه
Phonology-augmented statistical transliteration for low-resource languages
Transliteration converts words in a source language (e.g., English) into phonetically equivalent words in a target language (e.g., Vietnamese). This conversion needs to take into account phonology of the target language, which are rules determining how phonemes can be organized. For example, a transliterated word in Vietnamese that begins with a consonant cluster is phonologically invalid. Whil...
متن کاملDirect Orthographical Mapping for Machine Transliteration
Machine transliteration/back-transliteration plays an important role in many multilingual speech and language applications. In this paper, a novel framework for machine transliteration/backtransliteration that allows us to carry out direct orthographical mapping (DOM) between two different languages is presented. Under this framework, a joint source-channel transliteration model, also called n-...
متن کاملA Comparison of Different Machine Transliteration Models
Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models – grapheme-based translit...
متن کاملA General Method for Creating a Bilingual Transliteration Dictionary
Transliteration is the rendering in one language of terms from another language (and, possibly, another writing system), approximating spelling and/or phonetic equivalents between the two languages. A transliteration dictionary is a crucial resource for a variety of natural language applications, most notably machine translation. We describe a general method for creating bilingual transliterati...
متن کاملChinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages
In this paper, we present a novel backward transliteration approach which can further assist the existing statistical model by mining monolingual web resources. Firstly, we employ the syllable-based search to revise the transliteration candidates from the statistical model. By mapping all of them into existing words, we can filter or correct some pseudo candidates and improve the overall recall...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014